The rapid expansion of digital media platforms has increased the spread of fake news, resulting in misinformation and social challenges. Manual verification of online news content is inefficient due to the large volume of data. This study proposes a machine learning-based approach for fake news detection using natural language processing techniques. The Fake and Real News dataset obtained from Kaggle is used for experimentation. Text preprocessing is performed to remove noise, followed by feature extraction using the Term Frequency–Inverse Document Frequency (TF-IDF) method. Multiple machine learning classifiers, including Naive Bayes, Logistic Regression, Support Vector Machine, and Random Forest, are implemented and evaluated. Experimental results indicate that Support Vector Machine and Random Forest achieve higher accuracy compared to traditional classifiers. The proposed approach demonstrates the effectiveness of machine learning techniques in automated fake news detection.
Introduction
The text discusses the growing problem of fake news driven by widespread internet and social media use and highlights the difficulty of manual detection due to the vast amount of online content. To address this, the study explores machine learning and natural language processing techniques for automatic fake news detection by analyzing textual features.
The research reviews existing literature, noting that while deep learning approaches achieve high accuracy, they are computationally complex, and comparative studies using traditional machine learning methods are limited. To fill this gap, the study conducts a comparative analysis of several machine learning algorithms using a consistent framework.
A Kaggle dataset containing labeled fake and real news articles was used. After preprocessing the text and applying TF-IDF for feature extraction, four classifiers—Naive Bayes, Logistic Regression, Support Vector Machine (SVM), and Random Forest—were trained and evaluated using accuracy.
The results show that all models performed well, with SVM achieving the highest accuracy (97.2%), followed by Random Forest (96.5%). Naive Bayes and Logistic Regression showed slightly lower performance. Overall, the study concludes that advanced classifiers like SVM and Random Forest are more effective for fake news detection than traditional methods.
Conclusion
This study presented a comparative analysis of machine learning algorithms for fake news detection using natural language processing techniques. The results indicate that Support Vector Machine and Random Forest are more effective in identifying fake news compared to traditional classifiers. The proposed approach demonstrates the potential of machine learning in combating misinformation.
Future work can focus on applying deep learning models such as LSTM and transformer-based architectures. Additionally, incorporating metadata and real-time data sources may further improve detection accuracy.
References
[1] S. Shu et al., “Fake News Detection on Social Media: A Data Mining Perspective,” IEEE Intelligent Systems, 2017.
[2] X. Zhou and R. Zafarani, “A Survey of Fake News: Fundamental Theories, Detection Methods, and Opportunities,” ACM Computing Surveys, 2020.
[3] H. Ahmed et al., “Detecting Fake News Using Machine Learning,” 2018.
[4] S. Kaliyar et al., “Fake News Detection Using Deep Neural Networks,” IEEE Access, 2020.
[5] A. Ruchansky et al., “CSI: A Hybrid Deep Model for Fake News Detection,” CIKM, 2017.
[6] J. Ramos, “Using TF-IDF to Determine Word Relevance in Document Queries,” 2003.
[7] T. Mikolov et al., “Efficient Estimation of Word Representations in Vector Space,” 2013.
[8] Kaggle, “Fake and Real News Dataset,” Kaggle Repository.